Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis : Distributed dictionary representation.
نویسندگان
چکیده
Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.
منابع مشابه
A New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملILLINOIS-PROFILER: Knowledge Schemas at Scale
In many natural language processing tasks, contextual information from given documents alone is not sufficient to support the desired textual inference. In such cases, background knowledge about certain entities and concepts could be quite helpful. While many knowledge bases (KBs) focus on combining data from existing databases, including dictionaries and other human generated knowledge, we obs...
متن کاملCreating a Comparative Dictionary of Totonac-Tepehua
We apply algorithms for the identification of cognates and recurrent sound correspondences proposed by Kondrak (2002) to the Totonac-Tepehua family of indigenous languages in Mexico. We show that by combining expert linguistic knowledge with computational analysis, it is possible to quickly identify a large number of cognate sets within the family. Our objective is to provide tools for rapid co...
متن کاملUsing the Textual Content of the LMF-Normalized Dictionaries for Identifying and Linking the Syntactic Behaviors to the Meanings
In this paper we propose an approach for identifying syntactic behaviours related to lexical items and linking them to the meanings. This approach is based on the analysis of the textual content presented in LMF normalized dictionaries by means of Definition and Context classes. The main particularity of these contents is their large availability and their semantically control due to their atta...
متن کاملCombining Dictionary-Based and Example-Based Methods for Natural Language Analysis
We propose combining dictionary-based and example-based natural language (NL) processing techniques in a framework that we believe will provide substantive enhancements to NL analysis systems. The centerpiece of this framework is a relatively large-scale lexical knowledge base that we have constructed automatically from an online version of Longman's Dictionary of Contemporary English (LDOCE), ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Behavior research methods
دوره 50 1 شماره
صفحات -
تاریخ انتشار 2018